Mon. Not. R. Astron. Soc. 000, 000-000 (0000) Printed 5 February 2008 (MN MgX style file vl.4) 



Searching for the scale of homogeneity 



oo 

OS 



(N 
> 

o 
o 

oo 

Oh 
6 



13 



Vicent J. Martmez, 1 * Maria-Jesus Pons-Borderia, 2 t Rana A. Moyeed 3 -!- 
and Matthew J. Graham 1 ' 4 ^ 

1 Departament d'Astronomia i Astrofisica, Universitat de Valencia, E-^6100 Burjassot, Valencia, Spain 
2 Departamento de Fisica Teorica, Universidad Autdnoma de Madrid, 28049 Cantoblanco, Madrid, Spain 
3 School of Mathematics and Statistics, University of Plymouth, Drake Circus, Plymouth PL4 8AA, U.K. 
4 Centre for Astrophysics, University of Central Lancashire, Preston PR1 2HE, U.K. 



Accepted 1997 ???? ??. Received 1997 ???? ??; in original form 1996 ???? ?? 



ABSTRACT 

We introduce a statistical quantity, known as the K function, related to the in- 
tegral of the two-point correlation function. It gives us straightforward information 
about the scale where clustering dominates and the scale at which homogeneity is 
reached. We evaluate the correlation dimension, D2, as the local slope of the log-log 
plot of the K function. We apply this statistic to several stochastic point fields, to 
three numerical simulations describing the distribution of clusters and finally to real 
galaxy redshift surveys. Four different galaxy catalogues have been analysed using this 
technique: the Center for Astrophysics I, the Perseus-Pisces redshift surveys (these 
two lying in our local neighbourhood), the Stromlo-APM and the 1.2 Jy IRAS red- 
shift surveys (these two encompassing a larger volume). In all cases, this cumulant 
quantity shows the fingerprint of the transition to homogeneity. The reliability of the 
estimates is clearly demonstrated by the results from controllable point sets, such as 
the segment Cox processes. In the cluster distribution models, as well as in the real 
galaxy catalogues, we never see long plateaus when plotting D2 as a function of the 
scale, leaving no hope for unbounded fractal distributions. 

Key words: methods: statistical; galaxies: clustering; large-scale structure of Uni- 
verse 



1 INTRODUCTION 

The standard cosmology is based on the assumption that 
the Universe must be homogeneous on very large scales. Sev- 
eral pieces of evidence support this assumption: the homo- 
geneity and isotrop y of the microwave background radiation 
(3moot et al. 1992) and some asp ects of the large scale dis- 
tribution of matter (Peebles 1989) seem to strongly advocate 
uniformity on scales bigger than about 200 h' 1 Mpc (where 
H = 100h km s" 1 Mpc" 1 ). 

However the presence of very large features in the 
galax y distribution like the Bootes void (|Kirshner et al 



prejudice not necessarily supported by the observational ev- 
idence quoted above. They def end the alternative idea of a n 
unbounded fractal cosmology (Coleman & Pietronero 1992). 



Guzzo (1997) argues against this interpretation on the basis 
of a careful handling of the data. 

The spatial two-point correlation function is the sta- 
tistical tool mainly used to describe the clustering in the 
Universe (Peebles 1980, 1993). However, because of the in- 
tegral constraint (Peebles 1980), one cannot estimate it at 



1981) or the Great Wall (Seller fc Huchra 198S) which span 



a scale length of the order of 100 h~ Mpc calls the actual 
scale of homogeneity into question. Moreover other authors 
consider the assumption of homogeneity just a theoretical 



very large distances from the currently available redshift sur- 
veys. In order to study clustering in the regime where it is 
not very strong, we have only two possibilities: either we 
extend the size of the redshift catalogues or we use alterna- 
tive statistical descriptors. The approach described in this 
paper points in the latter direction. In the same line, other 
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authors (Fisher et al 1993; Park et al. 1994; Tadros fc Ef- 



stathiou 1996 ) have tried to measure the power-spectrum 
on large scales directly from galaxy catalogues. Einasto & 
Gramman (1993) studied the transition to homogeneity by 
means of the power-spectrum and found a relation between 
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the correlation transition scale and the spectral transition 
scale (turnover in P(k)). 

We introduce the quantity called K(r), which is related 
to the correlation function £(r). The novelty of our approach 
lies essentially in the fact that we shall use a cumulant quan- 
tity instead of a differential quantity such as Although 
for a point process the functions £(r) and K(r) are well de- 
fined, what we measure from the galaxy catalogues are just 
estimators of those functions. One of our main claims is that 
the estimators for K{f) are more reliable than the most cur- 
rently used estimators for £(r) and that makes its use recom- 
mendable (especially in three-dimensional processes and at 
large scales) despite its somewhat less informative character. 



2 THE K FUNCTION AND THE 
CORRELATION DIMENSION 

Within the field of the statistical analysis of point fields, new 
techniques and estimators of the clustering of spatial point 
patterns have been developed in the past decades. Unfor- 
tunately, the connection between this set of scholars and 
cosmologists is not as important today as it was in the late 
fifties when the Berkeley statisticians Neymann and Scott 
carried out an intensive programme about the analysis of 
the Lick catalogue (Neymann & Scott 1952, 1955). Today, 
one of the most popular summary statistics for point pat- 
terns is the 7f-function (Bartlett 1964, Ripley 1976, 1977, 
1981). Let us introduce this quantity using the terminology 
employed by statisticians and stressing its connections with 
the quantities used in cosmology (see also Szapudi & Sza- 
lay 1997 for a different application of Ripley's statistic to 
cosmology) . 

Let us consider a point process acting on a region 
D C R A with volume V whose output is a collection of 
positions of N galaxies (or clusters of galaxies) {x;}. If we 
take two infinitesimal volumes dVi and dVi around xi and 
X2 respectively, the joint probability of there being a point 
lying in each of these volumes reads: 



dP = A 2 (xi,X2)<ftW2, 



(1) 



where A2 is the so-called second order intensity function 
( Diggic 1983| ). If the process is stationary (invariant under 
translation) and isotropic (invariant under rotation), then 
A2(xi,X2) = A2(|xi — X2I). The two-point correlation func- 
tion can be expressed by means of it as: 



Aa(r) 



(2) 



where r = |xi — x 2 | and n is the mean number density in a 
fair sample. 

The second-order cumulative function K(r) is denned 
so that the expected number of neighbours a given galaxy 
will have at a distance less than r is nK(r). Therefore its 
relation with the two-point correlation function is 



K(r) = / 4tts 2 (1 +£(s))ds (3) 
Jo 

For a homogeneous Poisson process this function is just 
^Po is (r) = |r 3 . (4) 



2.1 Relation with other cumulant quantities 

Other second-order cumulant functions have been used in 
the statistical analysis of the large scale structure in the 
Universe. Within the context of the scaling or multifractal 
approach the partition function Z(q, r) introduced in the de- 
scription of the galaxy clustering by Martinez et al. (1990) is 
formally related with K (r) for the second moment (q — 2) 
by Z(2,r) — K(r)/V where V is the volume of the sam- 
ple. Borgani et al. (1994) perform an exhaustive analysis of 
the cluster distribution in both real catalogues of clusters 
of galaxies and simulations by means of the Z(q, r) func- 
tion. In that paper the authors give an expression for us- 
ing this partition function when a selection function has to 
be considered. The dependence of the results of Z(q, r) of 
the particular volume of the sample has led some authors 
(Dommguez-Tenreiro, Gomez-Flechoso and Martinez 1994) 
to normalize Z(2, r) in order to get a function Z nolln which 
coincides for different samples (within different volumes) in 
the homogeneous regime. Basically this normalization makes 
Znorm equivalent to K(r). 

Peebles (1980) introduced the moments of the counts 
of neighbours [N) r as the mean count of objects in balls 
of radius r excluding the central one. By definition {N) r 
is the correlation integral C(r) used by Martinez et al. 
(1995) in the analysis of the multiscaling properties of the 
matter distribution and is related with K{r) simply by 
{N) r — nK(r) — C(r). In fact, other authors (Martinez 
& Coles 1994; McCauley 1997) have chosen the normaliza- 
tion for the partition function Z(q, r) in such a way that 
(N) r = Z(2,r) directly. 

Taking the shape of the K function for a random ob- 
ject distribution (Poisson process) into account, one can just 
consider the difference between K(r) and Kp i s (r) which 
leads to another cumulant quantity commonly used in the 
statistical description of the galaxy clustering, 4n Js(r) = 
K{r) - K Pois {r), where (Peebles 1980, 1993) 

Js(r)= f £(s)s 2 ds. (5) 
Jo 

Finally, if one considers the quotient instead of the dif- 
ference, one gets the integral quantity used by Coleman & 
Pietronero (1992): 



nK(r) 



(6) 



(see also Cappi et al. (1998) for a discussion on the methods 
used by Pietronero and collaborators). 

The advantages of the use of K(r) with respect to other 
cumulant quantities are the following: 

(i) K(r) is well normalized. We can compare directly the 
K function of different samples, with different number den- 
sity and within different volumes without extra normaliza- 
tion. 

(ii) K(r) is a well-known quantity in the field of spatial 
statistics and several analytical results regarding its shape 
and variance are already available for a variety of point pro- 
cesses. 

(iii) It is very important to estimate the quantity K(r) di- 
rectly from the data and not through numerical integration 
of 1 + £(r), which introduces artificial smoothing of the re- 
sults. Several edge-corrected unbiased estimators are avail- 
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able for K(r). In the context of the present application, the 
most appreciable properties an estimator must hold are to 
have little variance and not to introduce spurious homogene- 
ity by means of the edge-correction. In the next subsection 
we comment on different estimators for K(r). 



2.2 Estimators 

We shall make the assumption that the process under con- 
sideration is stationary and isotropic. Such a process is als o 
referred to as 'statistically homogeneous' (Peebles 1993). 
Note, however, that when in this paper we talk about 'the 
scale of homogeneity', we mean the scale at which the spatial 
distribution of the objects is uniform or indistinguishable 
from a homogeneous Poisson process. Nevertheless, some 
cluster processes are still stationary and isotropic. 

There exist several estimators for K. A comparison of 
some of them can be found in Doguwa & Upton (1989). 
From the definition of K and ignoring the edge effects one 
could consider the following naive estimator 



K] 



N N 



i=l 3=1 



where 6 is Heaviside's step function, whose value is 1 when 
the argument is positive and otherwise. Obviously for a fi- 
nite sample this estimator will provide values for K smaller 
than the true values since neighbours outside the bound- 
aries are not considered. One possible solution is to consider 
only points in an inner region as centres of the balls for 
counting neighbours. The points lying in the outer region, a 
buffer zone (Upton & Fingleton 1987; Buchert & Martinez 
1993), take part in the estimator just as points which could 
be seen as neighbours at a given distance r of the points 
in the inner region. The inner region might shrink as r in- 
creases. However, this solution leads to biases (the sample 
is not uniformly selected), wastes a lot of data ; and obvi- 
ously increa ses the variance of the estimator (Doguwa & 
Upton 1989). The standard solution adopted in the statisti- 



cal studies of the large-scale structures is to account for the 
unseen neighbours outside the sample window by means of 
the following edge-corrected estimator 



Figure 1. An illustration of the weights used in the estimator 
of K (equation 9) in 2 dimensions. The rectangle represents the 
boundary of the sample. In this case, Wij is the proportion of 
the circumference of the circle centered at x^, passing through 
Xj, lying within the boundary of the sample. Depending on the 
relative positions of the galaxies with respect to the boundary, 



different cases are illustrated: (a) Wi 



1; (b) Wij = 1, 



Wji < 1; (c) Wij < 1, Wji < 1. It is clear from the plot that 
we weight the observed neighbour Xj of the galaxy x^ lying at a 
distance r (the radius of the circle) from it by the inverse of the 
probability that such a neighbour would be observed. 



where the weight ujij is an edge correction equal to the pro- 
portion of the area of the sphere centred at and passing 
through Xj that is contained in D; in other words, LOij is the 
conditional probability that the j'th point is observed given 
that it is at a distance r from the ith point. This correc- 
tion is suitable for stationary and isotropic processes and 
is illustrated in Fig. jjj. The unweighted K function will be 
negatively biased because we do not observe events outside 
the sampling window, so the observed counts from events 
which are less than a distance r from the boundary will be 
artificially low. 

It is still possible that the best estimator depends of the 
kind of point process to be studied (clustered or regular) and 
even on the particular scale range, (see Stoyan & Stoyan 
(1998) for a discussion on improved estimators). We have 
chosen the estimator Kn(r) because of its well known good 
performance in a variety of cases. 

In Baddeley et al. (1993) an analytic expression for u>ij 
is given in the case D is a cube. In order to ensure that the 
border correction is as free of error as possible, we have cho- 
sen to generate the synthetic samples we want to analyse in 
a region shaped in this way. Note however that we introduce 
a certain bias when we estimate n through N/V but we are, 
on the other hand, making full use of all sample points. For 
all the mentioned estimators it is possible to build the corre- 
sponding versions for flux-limited samples by simply adding 
a weighting factor representing the selection function. 



K DV (r) 



V_ 
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EE 



r - X,; - x,- 



3=1 
3^i 



Mr) 



(8) 



where fi (r) is the fraction of the volume of the sphere of ra- 
dius r centred on the object i which falls within the bound- 
aries of the sample. This kind of edge-correction has been 
used by Borgani et al. (1994) when calculating the parti- 
tion function Z(q, r) used in the multifractal analysis. In the 
field of spatial statistics it had been introduced by Doguwa 
and Upton (1989). Although this e stimator has good prop - 
erties, it could be slightly biased ( stoyan & Stoyan 1994 ) . 



The most commonly used unbiased edge-corrected estima- 
tor in the analysis of point process es is Ripley's estimat or, 
which under our hypotheses reads (Baddeley et al. 1993): 



N N 



i=l 3=1 



9{r- 



i\) 



UJij 



(9) 



2.3 The correlation dimension 

If scaling of the first moment of the count of neighbours 
holds, then C(r) is proportional to r D2 and K(r) as well, 
since n is constant when looking at the whole region. The 
exponent D2 is known as the correlation dimension and it 
tends to 3 on the scale r at which homogeneity is reached, 
hence its importance. 

Once we compute K, we obtain the local dimension 
D2 as the slope of a five-point log-log linear regression on 
the function K(r) (as in Borgani et al. 1994). This local 
correlation dimension can be considered as a sliding window 
estimate (t hrough the scale) of the fractal dimension (Dubuc 
et al. 198S| ) . Any long plateau for a significant range of scales 



will be the fingerprint of a fractal range. The tendency to 
homogeneity will be described by an increasing behaviour of 
the local dimension D2 as a function of the scale, towards 
the value consistent with homogeneity D2 = 3. This test is 
much stronger than just to fit a straight line to a log-log 
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Figure 2. One of the realizations of each of the stochastic point 
processes analysed here: Poisson, Soneira-Peebles and Voronoi 
models described in Section 3. The side length is 300 h~ 1 Mpc. 



Figure 3. The functions K(r) and D2(r) for the point processes 
i), ii) and iii). The straight continuous line corresponds to -R"p i s - 



plot, which could lead to wrongly interpret as fractals sets 
which clearly are not, as we show in Section 5.3 (see also 
Stoyan 1994, McCauley 1997). 



3 THE K FUNCTION ON STOCHASTIC 
POINT PROCESSES 

In order to test the usefulness of the statistic K and the 
accuracy of its estimator [equation ([])] we have calculated 
K and D2 for three different point samples. They are con- 
trollable point processes for which we have some a priori 
knowledge of the expected behaviour of the K function. In 
order to compute as well the deviations from this theoretical 
behaviour when the empirical value is obtained on a point 
sample, we have analysed several realizations of each pro- 
cess, simulated in a cube of side 300 ft -1 Mpc. Let us briefly 
describe the models analysed here. 

i) Poisson sample. We generate 10 of these samples con- 
taining 1000 points each in order to test if our quantities 
behave in the expected way in the absence of any pattern. 

ii) Soneira-Peebles model (S-P). Starting from a sphere 
of radius 7? and following Soneira & Peebles (1978), we ran- 
domly place 77 spheres of radius R/\. In each of the result- 
ing smaller spheres we do the same with spheres of radius 
R/\ 2 . We repeat this L times and consider the centres of 
the rj L spheres in the last level as our study objects. Al- 
though Soneira & Peebles (1978) add up N r realizations of 
this kind, we simply generate one, which is known to be a 
single fractal (Martinez et al. 1990). The model is included 
inside a cube of side 300 h~ Mpc. The values of the pa- 
rameters we have chosen are r\ — 2, A = 2, L — 10, and 
we have not allowed spheres in the same level to overlap. 
The expected value of D2 can be analytically calculated for 
this model through the relation D2 = (log rj) / (log A) and, 
for this choice of the parameters, it turns out to be D2 = 1. 
We have generated 10 of such samples. 



iii) Vo ronoi sample. Essentially it is built (van de Wey- 



gaert 1991) by choosing points at random as centres of the 
cells of a Voronoi tessellation, which is produced by draw- 
ing planes equidistant to the nearest centres. Then a certain 
amount of objects (events of the point process) following a 
Gaussian distribution are put on and around the filaments 
which result of the intersection of two of those planes. We 
have analysed 5 realizations containing around 2500 points 
each. 

All these point processes are shown in Fig. || and now 
we shall comment on the results of the it-function for the 
first three point processes, which are graphically represented 
in Fig. Ea. For each sample, we plot the average of the func- 
tion K(r) calculated over the 10 realizations in the range 
of scales [10, 100] ft -1 Mpc, except for the Voronoi sample, 
which begins at r = 1 h -1 Mpc. We plot the corresponding 
standard deviation as error bars. These are usually smaller 
than the size of the dot, probing thus the stability of the esti- 



mator. The dotted line represents the theoretical behaviour 
for a Poisson process. Let us mention that the depth up to 
which a statistic can be reliably calculated depends on the 
statistic itself and on the estimator used as well as on the 
geometry of the region. In our case, we have calculated K(r) 
for r G [5, 150] h^ 1 Mpc and it follows the same trend as in 
the shorter interval [10,100] h -1 Mpc that we have plotted, 
except for more significant noise at very small scales. On the 
top of each panel we show the local dimension D2 calculated 
as stated in Section 2. Note that we do not calculate D2 bas- 
ing ourselves on the K of each individual sample but on the 
average K of all the samples of a same process. Error bars 
are now 1 a uncertainties in the five-point weighted least 
squares fit. 

In the estimation of the K function, we have checked 
that the lack of border correction (i.e., if we take uiij = 
1 Vi,j) would underestimate significantly the amount of 
clustering, giving a false sense of regularity in the data. 

Let us first point out that K has fully accomplished 
its mission as a detector of the scale of homogeneity. For 
the Poisson distribution we see that the estimator of Kir) 
matches quite well the expected behaviour of equation (^). 
Moreover the value of the local dimension is nearly constant 
for the whole range of scales and D2 = 3, with little fluctu- 
ations at small scales due to the small number of points. 

The other two models clearly show clustering, since the 
function K(r) > Kpd a (r). Note that homogeneity begins 
to be reached when the K curve of the sample falls on the 
theoretical straight line. The simple S-P clump is a pure 
fractal and hence it never reaches homogeneity; on the plot, 
this corresponds to the fact that the sample and theoretical 
lines do not coincide over the whole range of scales, being 
both straight lines with different slopes. The local dimension 
D2 as a function of the scale r is practically a horizontal line 
at the theoretically predicted height, D2 — 1. This plateau 
is the fingerprint of fractal behaviour. The lacunarity of this 
fractal set, appreciated in Fig. 1, is responsible for the im- 
portant error bars in the plot of D2. For larger values of 
r the estimates of K and hence of D2 are biased by the 
edge correction, which assumes that the process is station- 
ary and isotropic. Nevertheless for r < 80 Mpc they 
give the correct result although the point process does not 
fulfill the required assumptions, showing that the method 
does not introduce spurious homogenization. 

The different prescriptions used to generate the anal- 
ysed point processes are reflected in the different shapes of 
K and in the behaviour of D2 as a function of the scale. The 
Voronoi model used here is based on the population of the 
filaments; this can be seen in the fact that, for the smallest 
scales shown in the plot (r < 8 ft -1 Mpc), the dimension 
D2 — 1, corresponding to filamentary-like structures. At 
large scales we can appreciate a continuous variation of the 
local dimensionality from D2 — 1 to D2 — 3, corresponding 
to the scale at which homogeneity is reached (r bonl ~ 50 h~ 
Mpc). 
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Figure 4. The different iV-body simulations of the cluster dis- 
tribution. The side of the box is 300 h~ 1 Mpc. 



Figure 6. The same as Fig. 3 for the three simulations of the 
cluster distribution 



Figure 5. The standard correlation function £(r) for the three 
cluster samples. We have plotted the average values over 10 real- 
izations (only 8 for MDM) and the error bars correspond to 1 a 
deviations. 



Figure 7. The average and standard deviations of the K func- 
tion for 8 subcubes of side 150 h _1 Mpc extracted from one of 
the MDM realizations. The dotted line corresponds to the whole 
simulation. 



4 THE K FUNCTION ON SIMULATIONS OF 
THE CLUSTER DISTRIBUTION 

We have furthermore analysed three sets of 10 A-body sim- 
ulations of the distribution of clusters of galaxies (Croft & 
Efstathiou 1994a, 1994b). They have used a P 3 M code to 
generate these samples, containing around 1000 clusters each 
of them in a cube of comoving side 300 h~ Mpc. 

i) Standard CDM. It has been generated from a CDM 
power spectrum that applies for models with low baryon 
density. The chosen values of the cosmological parameters 
are Q = 1 and h — 0.5. 

iii) Mixed dark matter (MDM). It contains a mas- 
sive neutrino component and CDM in a proportion so that 
fitotal — 1 and — 0.3, taking h — 0.5. 

iii) CDM with cosmological constant (A-CDM). The 
values of the parameters are Q, = 0.2, h = 1 and A/(3Hq) = 
0.8. 

All these simulations are shown in Fig. ^ 
In Fig. [B| we have included, for the sake of comparison, 
the mean and 1 a errors of £ (r) for all the realizations of each 
A-body simulation. The estimator used is that of Rivolo 
(1986) 



l+f(r) = 



£ Ni(r) 

i=l 



(10) 



with Ni(r) being the number of neighbours of the ith point 
in a shell centred at that point and having radii r — dr/2 
and r + dr/2, and Vt being the volume of the intersection 
of that shell with the region D. The re exists an analytica l 
expression for Vi when D is a cube (Baddeley et al. 1993). 
The estimator in equation ( |lo[ ) gives smaller errors than the 
usual Davis & Peebles (1983) estimator. This question is 
being addressed at length in Pons-Borderia et al. (1998). 
In any case, the errors at large distances (r > 10 h^ 1 Mpc) 
are very important. It is precisely in that region where it is 
especially interesting to use K, since £ is dominated by the 
noise there and, on the contrary, errors on K are acceptable 
up to at least 1/3 of the box sidelength. Although K is not 
as informative as £, in the same way that a distribution 
function is less informative than a density, it is better to 
have some information than to have none. 

In Fig. ^| we present the results of the K function and 
the local dimension D2 for the cluster simulations. It is quite 
evident that no clear plateau D2 is observed, although two 
different regimes in its behaviour are appreciated. At small 
scales, (r < 30 — 40 ft -1 Mpc), D2 increases with the scale 



very slowly, with some cases showing behaviour compatible 
with a constant value of D2 within the uncertainties. Never- 
theless, the systematic increasing trend is observed for the 
three cases. In the second regime, for r > 40 h -1 Mpc, the 
tendency to attain homogeneity is more clearly appreciated 
and D2 increases more rapidly with the scale. This might be 
a good probe in order to test a possible fractal character of 
the geometry of the Universe. 

It is interesting to observe how the MDM model and 
A-CDM have a similar effect with respect to the standard 
CDM, namely the increase of the amount of clustering at 
all scales and the delay of the achievement of homogeneity 
(D2 — 3). The scale at which this happens is 70/i _1 Mpc for 
MDM and A-CDM instead of 50/i _1 Mpc for CDM. Note 
that the stronger the clustering is, the larger the values of the 
K function. We see that, among the three A-body simula- 
tions, the strongest clustering is observed in MDM, followed 
by the A-CDM model and finally by the standard CDM. 
Regarding the dimension D2, standard CDM and A-CDM 
show similar values at small scales, while at large scales the 
agreement is between the MDM and the A-CDM models. K 
is none the less not able to distinguish between MDM and 
A-CDM. 

In order to test the stability of this statistic when ap- 
plied to smaller regions, we have subdivided one of the cubes 
(of 300 ft" 1 Mpc sidelength) containing a MDM simulation 
in 8 smaller cubes of 150 ft -1 Mpc sidelength. We have cal- 
culated the K function for each of the subcubes up to 50 
h^ 1 Mpc and, as we can see in Fig. the statistic and its 
estimator are quite stable, since the results for the whole 
sample lie within the 1 a deviation from the average of K 
calculated on the smaller cubes. This is an illustration of the 
robustness of the estimator. 



5 THE K FUNCTION ON GALAXY 
REDSHIFT SURVEYS 

5.1 The surveys 

In order to check if the K function produces reliable re- 
sults when used on real data sets, we have applied it to four 
different galaxy catalogues. In all cases we have extracted 
volume-limited samples which are shown in Fig. |^ redshifts 
had been corrected from the heliocentric velocity with re- 
spect to the microwave background. Let us briefly describe 
the analysed samples: 
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5.1.1 CfA-I 

The CfA-I catalogue with magnitude limit tub = 14.5, com- 
piled by Huchra et al. (1983) was based on Zwicky's angu- 
lar catalog. We have extracted a complete volume-limited 
sample, which we shall call CfA80, lying inside the region 
delimited by galactic latitude b > 40°, declination 8 > 0, dis- 
tance to the Earth greater than 17/i _1 Mpc and less than 
80/i -1 Mpc. It contains 185 galaxies, being the Coma clus- 
ter the most prominent feature in the sample. The volume 
comprised by it is ~ 3.13 x 10 5 (/i _1 Mpc) 3 and subtends a 
solid angle of uj — 1.832605 sr. 



Figure 8. Equal-area projections of some of the volume-limited 
samples analysed here: a) PPS (the north celestial pole is at the 
centre of the plot); b) The volume— limited sample of the Stromlo- 
APM redshift survey (now the south galactic pole is at the centre 
of the plot); c) Aitoff projection of the volume-limited sample of 
the IRAS 1.2 Jy survey (in galactic coordinates). 



5.1.2 Perseus-Pisces Supercluster (PPS) 

The Perseus-Pisces catalogue has been compiled by Gio- 
vanelli & Haynes (1991), taking as a basis the old Zwicky 
catalogue. Its magnitude limit is raz = 15.7 and it covers (in 
equatorial coordinates) the region a 6 [22 h ,24 h ] U [0\4 h ], 
5 G [0°,50°]. The most important feature contained in it is 
the Perseus-Pisces filamentary supercluster. The catalogue 
magnitudes have been corrected from interstellar absorp- 
tion. 

The extraction from this catalogue of a volume-limited 
sample has been performed by several authors, (see Ghigna 
et al. (1994) and references therein). We use the volume- 
limited sample extracted by Kerscher et al. (1997). They 
have neglected the zone most affected by galactic obscura- 
tion and restricted the sample to the area a G [22' l .5,24' l ]U 
[0 h ,3 h ], S G [0°,40°]. The depth is 79/T 1 Mpc, it contains 
817 galaxies and covers a solid angle of 0.76 sr, being the 
volume ~ 1.24 x 10 s (ft -1 Mpc) 3 . 

5.1.3 Stromlo-APM 

The Stromlo-APM redshift survey was compiled by Loveday 
et al. (1996), b ased on the APM Bright Galaxy Catalogue 
(Loveday 1996). It consists of 1797 galaxies with a magni- 
tude limit of bj < 17.15 selected randomly at a rate of 1 in 
20. The survey covers 4300 square degrees of the southern 
galactic sky, approximately defined in equatorial coordinates 
by a G [2l\ 24 h ] U [0 h , 5 h ], S G [-72.5°, -17.5°]. The survey 
redshifts have been transformed to the Local Group refer- 
ence frame and if-corrections have been applied for different 
morphological types in the bj system. 

We have extracted a sample volume-limited to 200/i -1 
Mpc (assuming go = 0.5), consisting of 387 galaxies. Dis- 
tances are calculated according to the Mattig formula which 
for this choice reads as: 



2c 



(11) 



The fact that the sample is not complete is not a prob- 
lem for the calculation of K since the function K is invariant 
under thinning (van Lieshout & Baddeley 1996) . 

5.1.4 IRAS 1.2 Jy 

The IRAS 1.2 Jy survey was compiled by Fisher et al. (1995) 
based on the IRAS Point Source Catalogue (PSC) by Beich- 
man et al. (1985). It contains 5321 galaxies and is complete 
to a flux limit of 1.2 Jy at 60/im. The survey covers 87.6% of 



Figure 9. The K function for the CfA80 sample, the PPS sample 
and the Stromlo-APM survey. On top we show the local dimen- 
sion Di (only where the correlation coefficient was at least 0.97) 
as a function of the scale r. 



the sky, excluding only the Galactic plane region | 6 |< 5°, 
a small region of sky not surveyed by IRAS and confused 
regions in the PSC. 

We have extracted a sample volume-limited to 120/i _1 
Mpc from the survey. It contains 561 galaxies, comprising 
270 galaxies in the northern galactic hemisphere and 291 
galaxies in the southern hemisphere. 



5.2 Results 

We have typically analysed the K function up to 1/2 of 
the cubic root of the volume of each sample. The weighing 
term o>jj in the denominator of the estimator Ka in equa- 
tion fjj), which depends upon the geometry, is now mea- 
sured by Monte Carlo integration since the shapes of the 
galaxy samples are not simple cubes but something much 
more complicated. 



5.2.1 Optical samples 

In Fig. ^| a plot of the results for the optically selected sam- 
ples can be seen. On the top panel the results for the correla- 
tion dimension D2 are shown, calculated where the function 
K fits reasonably well a power-law. 

In the CfA80 sample, due to its shallowness, we have 
calculated K only up to ~ 30/i _1 Mpc but it is enough to 
witness its transition to homogeneity. It is also remarkable 
the rapid change of its D2, which increases from 1.3 to al- 
most 2.5 in less than 10fo _1 Mpc. 

The Perseus-Pisces survey presents clustering at all the 
scales we could analyse but it tends to homogeneity with in- 
creasing r, since its K value tends to Kp i s . The PPS sample 
is contained in too a small region to allow us inspection of 
K at very large scales. We have calculated the K function 
just up to 25 h~ x Mpc. Although it could be possible to 
fit a single power-law to the behaviour of the K function 
over the whole range with reasonable accuracy (Guzzo et 
al. 1991; Bonometto et al. 1994; Pietronero, Montuori & 
Sylos-Labini 1996), we are able to detect a decrease in the 
amount of clustering through the use of the local dimension 
D2, which goes from 1.8 at scales around 1 Mpc to 2.3 
at scales around 20 h~ l Mpc. In order to go beyond 20 h~ 
Mpc we have to analyse the other deeper samples. 
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5.3 Reliability of the results 



Figure 10. The K function for the volume-limited samples of 
the 1.2 Jy IRAS catalogue. We use different marks for the North 
and South Galactic hemispheres. The differences between both 
arc significant at small scales. 



One way of measuring the errors on the estimates of a sta- 
tistical function such as K(r) is based on the dispersion 
of its measures when applied to ensembles of artificial cat- 
alogues having similar statistical properties (Fish er et al. 



1993; Hamilton 1993). The segment Cox processes (Stoyan 



Kendall & Mecke 1995) are quite useful in this context. The 



At small scales the values of K for the Stromlo-APM 
sample are noisier due to the sparseness of the sample. In 
the range [5-25] h~ x Mpc, however, where the estimates are 
more reliable, there is a reasonable agreement among the 
results for the three samples. 

It is interesting to notice that there is a kind of conti- 
nuity between the D2 values for the three optical samples. 
One can form an increasing curve which begins at small 
scales sampled by CfA80 with a value D2 — 1.3, joins the 
value D2 — 2.2 of PPS at intermediate scales and finally ap- 
proaches the homogeneity value D2 — 3 with APM (which 
is the deepest sample), ruling out the idea of an unbounded 
fractal universe. This result is in agreement with a recent 
paper by Scaramella el at. (1998) in which they found evi- 
dences for a D2 — 3 dimensionality for the ESO Slice project 
redshift survey and the Abell (ACO) clusters, using, as we 
do, comovil cosmological distances and /^-corrections. Sim- 
ilar conclusions are reached by Wu, Lahav & Rees (1998) 
from the analysis of the X-ray Background and the Cosmic 
Microwave Background. 

The increasing value of D2 with the scale is less evi- 
dent, although appreciable, for the PPS sample, in which 
the behaviour is a smooth increasing trend with oscillations 
that might be misinterpreted as a constant. In any case, it 
is interesting to remark that this survey has been selected 
to isolate one strong feature in the local Universe (Iovino et 
al. 1993), the Perseus-Pisces supercluster, a very big sheet- 
like structure, which contributes strongly to values close to 
2 for the correlation dimension. However, as we have seen 
with the other analysed samples, this behaviour is particu- 
lar of this sample due to a selection effect, and cannot be 
extrapolated to the whole Universe. 



5.2.2 IRAS 1.2 Jy 

Here we have analysed separately both galactic hemispheres 
by means of the K function. In Fig. |Io|we see significant dif- 
ferences between the K function of the North and the South 
samples up to 20 ft -1 . The clustering seems stronger in the 
Southern hemisphere than in the Northern one. This result 
corroborates the findings of Kerscher et al. (1997). These 
authors found differences in the strength of the clustering 
between both hemispheres by means of the Minkowski func- 
tional and nearest-neighbour distributions. Also at small 
scales the values of K are typically larger for the IRAS 1.2 
Jy sample than for the optical samples analysed previously. 
This result is in full agreement with the estimates of the 
correlation function at small scales reported in Fisher et al. 
(1994), Bonometto et al. (1994) and Loveday et al. (1995). 
However, at larger scales the K functions of both hemi- 
spheres approach in the same way the curve corresponding 
to a uniform Poissonian distribution. 



particular model of Cox process (see also Pons-Borderia et 
al. 1998) used here has been generated in the following way: 
we scatter randomly in the space segments of length I, with 
A 3 being the mean number of segments per unit volume and, 
on those segments, we randomly distribute points so that a 
chosen A; be the mean number of points per unit length of 
the segments. The intensity of this process will be A = A;A S Z 
and, as proven in Stoyan, Kendall & Mecke (1995), its K 
function will be given by: 



K(r) 



#r 3 + 4- iir>l 



(12) 



and one will be able to calculate D2 analytically simply as: 



D 2 {r) = 



r dK(r) 



K{r) 



dr 



(13) 



We have generated 10 such Cox processes containing 
between 1400 and 1600 points each inside a cube of side 100 
h" 1 Mpc. The values we have taken for the parameters are 
I = 20, \ a = 4 x 10~ 5 ,A; = 1.88, A ~ 1500/100 3 . 

In Fig. [ll] we can see the average empirical values of 
the estimates of K(r) obtained by means of equation Q to- 
gether with the standard deviations. The expected theoreti- 
cal function given by equation ([l2]) is depicted as a solid line. 
As we can see, an empirical estimate of K calculated on 10 
Cox processes reproduces quite satisfactorily the expected 
theoretical behaviour. Note that the variance of the num- 
ber of counts in a bounded set for a Cox process is always 
bigger than in a Poisson process having the same intensity 
( [gtoyan, Kendall fc Mecke 1995 ). It is important to notice 



that the border correction has not destroyed the goodness of 
the estimator; in particular, it has not introduced spurious 
homogeneization. Our estimator has worked successfully not 
only in the "easy" case of absence of structure which repre- 
sent Poisson processes but it has also been able to reproduce 
quite exactly the very precise value of K for a clustered Cox 
process. This test gave us enough confidence to believe that 
the K results obtained from the galaxy samples effectively 
reflect the structure existing there. 

The Cox processes used here also play an interesting 
role. They are a good example with which to prevent the 
naive use of fractals in the analysis of point fields as it has 
been anticipated by Stoyan (1994). In the inset of Fig. O, we 
can see the function £(r) expected for this process. The func- 
tion is the sum of two power-laws £ (r) = Ar~ 2 + Br -1 with 
A = (2tt\ s I)~ 1 and B = — (27rA s / 2 ) -1 . At short distances 
the first power-law dominates and therefore the function 
£(r) can be nicely fitted to a power-law £(r) tx r -7 with 
7 = 2 and this could lead to interpret that the point set 
is a fractal when clearly it is not (Stoyan 1994; McCauley 
1997). Because in this regime £(r) ^> 1, the same can be 
said for the function 1 -+- £(r). Looking at the top panel of 
Fig. [n], we can see the behaviour of the empirical local di- 
mension D2 calculated over the average of the 10 realizations 
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Figure 11. Bottom panel: the average and 1 <r deviation error 
bars of the function K(r) for 10 realizations of the Cox point 
processes (solid discs). The inset shows the two-point correlation 
function of this stochastic model. Top panel: the local correlation 
dimension D2 with 1 a uncertainties calculated by means of a 
five-point weighted log-log least square fit on the average of K. 
In both panels the solid line shows the theoretical values, while 
the dotted line in the bottom panel corresponds to KpoisM- 



of the Cox processes together with the 1 a deviations. The 
solid line represents the expected theoretical values [equa- 
tion @]. Again we can see the reliability of the estimates. 
But what it is more interesting in this example is the long 
plateau observed in the plot of the correlation dimension. 
The value D2 — 3 — 7 ~ 1 remains nearly constant for a 
broad range of scales, due to the particular behaviour of the 
K function for this model. After the "fractal" behaviour, 
a transition to homogeneity is clearly appreciated in both 
D2 and K(r). It is interesting to remark the qualitatively 
similar behaviour between this figure and Fig. ^ in which 
we showed the same function for the analysed optical red- 
shift surveys: a regime at small scales where the clustering 
is strong (with K 3> ifpois) and where K can be fitted to 
a power-law. At larger scales, however the increasing be- 
haviour of the local dimension D2 with the scale and the 
continuous approximation of the function K to Kp i S are 
absolutely clear. At this point we want to remark that, in 
the same way that the term fractal is not appropriate for 
the Cox process, even having a co rrelation fun ction decay- 
ing as a power-law at short scales ( stoyan 1994 ) , the galaxy 
distribution, even hold ing a similar pr operty, is not a frac- 
tal in a rigorous sense ( McCauley 1997). Howev er in a more 
loose use of the term fractal (Avnir et al. 1998), it could be 
appropriate to talk about a "fractal" regime to describe the 
range of scales where K(r) follows a power-law, bearing in 
mind that a real self-similar point pattern, for example the 
Soneira-Peebles model described in section 3, verifies other 
conditions (self-similarity) apart from a power-law decaying 
correlation function. 



6 CONCLUSIONS 

We should like also to comment briefly on the relation of K 
with the correlation function f(r). Both play their role in 
the analysis of the point pattern and, as Stoyan & Stoyan 
(1996) say, their relation is similar to that between the dis- 
tribution function and the probability density function in 
classical statistics. The use of a cumulative quantity such 
as K avoids binning i n distance, which is often a source 
of arbitrariness for £ ( Ripley 1992| ). Let us explain why £ 
does suffer from the hindrance of splitting the information 
into disjoint bins. When one estimates £(r) in [r, r + dr], it is 
assumed that within that bin the correlation function is con- 
stant, and since this is obviously not true, the larger the bin 
the larger the error, but we cannot make arbitrarily small 
the size dr of the bin, because in that case we would not find 
any pairs. In other words, £(r) has an additional source of 
bias, not present in K, due to the smoothing caused by av- 



eraging over pairs of points close to but not exactly r units 
apart of each other (Stein 1996). 

The correlation length (ro|£(ro) = 1) is j us ^ the scale 
at which the density of galaxies is, on average, twice the 
mean number density. At smaller scales the pair correlations 
are due to non-linear perturbations, but homogeneity is not 
reached till £(rh om ) ~ 0. The main interest of K is that it 
permits us to study clustering precisely in that "difficult" 
range where ro < r < r^ om _, which cannot be reached by 
£ because in this range the errors on the estimates of £ are 
comparable with their values, while the difference K — Kp i s 
is still meaningful. 

As a concluding remark, we want to stress that an un- 
biased estimator of a quantity related with the correlation 
integral, known as the K function, has been applied to cos- 
mological simulations and galaxy samples. This function, 
extensively used in the field of spatial statistics, provides a 
nice measure of clustering. The border correction used here 
does not waste any data points and does not introduce spu- 
rious homogeneization, giving reliability to the evaluation 
of this function at large scales. Through the slope of K we 
are able to calculate D%, which is an indicator of a possi- 
ble fractal behaviour of the point process at a given scale 
range. The clear physical meaning of K and D2 helps us 
easily interpret the clustering properties of different models 
of structure formation at different scales. 

Regarding the analysis of the galaxy redshift surveys, 
we have seen that the estimator of the K function is robust 
in the sense that it does not depend on the shape of the study 
region and provides us with reliable information about the 
point patterns over a wide range of scales. The behaviour of 
the local dimension D2 for the real galaxy samples is par- 
ticularly interesting to proponents of various fractal models 
of large-scale structure. If a constancy of D2 with the scale 
is a necessary condition for having a fractal point pattern 
(although it should not be sufficient as we have seen with 
the Cox process [see also Stoyan (1994) for more examples]), 
it is a neat conclusion of our analysis that the galaxy dis- 
tribution does not even hold the necessary condition. The 
analysis presented here will provide a conclusive test to dis- 
cover the scale at which the distribution of the matter in the 
Universe is really homogeneous when applied, in the near fu- 
ture, to the bigger and deeper galaxy catalogues which will 
be soon ready for common use. 
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